A Holistic Methodology for Keyword Search in Historical Typewritten Documents
نویسندگان
چکیده
In this paper, we propose a novel holistic methodology for keyword search in historical typewritten documents combining synthetic data and user's feedback. The holistic approach treats the word as a single entity and entails the recognition of the whole word rather than of individual characters. Our aim is to search for keywords typed by the user in a large collection of digitized typewritten historical documents. The proposed method is based on: (i) creation of synthetic image words; (ii) word segmentation using dynamic parameters; (iii) efficient hybrid feature extraction for each image word and (iv) a retrieval procedure that is optimized by user's feedback. Experimental results prove the efficiency of the proposed approach.
منابع مشابه
Efficient segmentation-free keyword spotting in historical document collections
In this paper we present an efficient segmentation-free word spotting method, applied in the context of historical document collections, that follows the query-byexample paradigm. We use a patch-based framework where local patches are described by a bag-of-visual-words model powered by SIFT descriptors. By projecting the patch descriptors to a topic space with the Latent Semantic Analysis techn...
متن کاملResolving Student Entities in the Facebook Social Graph
Despite the popularity of social networking sites and the abundance of information available on the internet, finding a holistic overview of an individual remains difficult. Profiles are often fragmented across sites, so users must issue queries to several different services and manually combine the results. Search engines like Google and Pipl utilize keyword indices to suggest a list of potent...
متن کاملFuzzy retrieval of encrypted data by multi-purpose data-structures
The growing amount of information that has arisen from emerging technologies has caused organizations to face challenges in maintaining and managing their information. Expanding hardware, human resources, outsourcing data management, and maintenance an external organization in the form of cloud storage services, are two common approaches to overcome these challenges; The first approach costs of...
متن کاملAn Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملKeyword Searching for Arabic Handwritten Documents
In this paper we present a system for searching keywords in Arabic handwritten and historical documents using two algorithms, Dynamic Time Warping (DTW) and Hidden Markov Models (HMM). The HMM based system provides satisfying results when it is possible to provide adequate training samples (which is not always possible in historical documents). The DTW algorithm with a slight modification provi...
متن کامل